3

问题描述

今天在启动docker容器的时候发现一段时间后宿主机上所有的容器的根目录全部变成了只读,并且宿主机message日志报磁盘相关的错

容器内mount结果如下

[root@zk-1 ~]# mount
/dev/mapper/docker-253:0-4298664622-7830c39693a73c13e80cf2a22a46558b22adcc5adf94ca1c893f44ece878601e on / type ext4 (ro,relatime,stripe=16,data=ordered)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
/dev/mapper/centos-root on /nfsc type xfs (rw,relatime,attr2,inode64,noquota)
tmpfs on /run/secrets type tmpfs (rw,nosuid,nodev,noexec,relatime)
/dev/mapper/centos-root on /etc/resolv.conf type xfs (rw,relatime,attr2,inode64,noquota)
/dev/mapper/centos-root on /etc/hostname type xfs (rw,relatime,attr2,inode64,noquota)
/dev/mapper/centos-root on /etc/hosts type xfs (rw,relatime,attr2,inode64,noquota)

宿主机报错如下

Jan 13 15:06:01 docker2 systemd: Starting Session 6 of user root.
Jan 13 15:06:01 docker2 systemd: Started Session 6 of user root.
Jan 13 15:06:31 docker2 systemd: Starting Session 7 of user root.
Jan 13 15:06:31 docker2 systemd-logind: New session 7 of user root.
Jan 13 15:06:31 docker2 systemd: Started Session 7 of user root.
Jan 13 15:07:01 docker2 systemd: Starting Session 8 of user root.
Jan 13 15:07:01 docker2 systemd: Started Session 8 of user root.
Jan 13 15:07:38 docker2 kernel: device-mapper: thin: 253:3: reached low water mark for data device: sending event.
Jan 13 15:07:44 docker2 kernel: device-mapper: thin: 253:3: switching pool to out-of-data-space (queue IO) mode
Jan 13 15:08:01 docker2 systemd: Starting Session 9 of user root.
Jan 13 15:08:01 docker2 systemd: Started Session 9 of user root.
Jan 13 15:08:44 docker2 kernel: device-mapper: thin: 253:3: switching pool to out-of-data-space (error IO) mode
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269408)
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269408
Jan 13 15:08:44 docker2 kernel: Aborting journal on device dm-4-8.
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269409
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269410
Jan 13 15:08:44 docker2 kernel: EXT4-fs error (device dm-4): ext4_journal_check_start:56: Detected aborted journal
Jan 13 15:08:44 docker2 kernel: EXT4-fs (dm-4): Remounting filesystem read-only
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269411
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269412
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269413
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269414
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269415
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269416
Jan 13 15:08:44 docker2 kernel: Buffer I/O error on device dm-5, logical block 9269417
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269424)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269440)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269456)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269472)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269488)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269504)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269520)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269536)
Jan 13 15:08:44 docker2 kernel: EXT4-fs warning (device dm-5): ext4_end_bio:332: I/O error -28 writing to inode 1320256 (offset 13077839872 size 8388608 starting block 9269552)
Jan 13 15:08:44 docker2 kernel: Aborting journal on device dm-5-8.
Jan 13 15:08:44 docker2 kernel: EXT4-fs error (device dm-5) in ext4_da_write_end:2782: IO failure
Jan 13 15:08:44 docker2 kernel: EXT4-fs error (device dm-5): ext4_journal_check_start:56: Detected aborted journal
Jan 13 15:08:44 docker2 kernel: EXT4-fs (dm-5): Remounting filesystem read-only
Jan 13 15:08:44 docker2 kernel: EXT4-fs error (device dm-5) in ext4_do_update_inode:4504: Journal has aborted
Jan 13 15:08:44 docker2 kernel: EXT4-fs error (device dm-5): mpage_map_and_submit_extent:2229: comm kworker/u98:3: Failed to mark inode 1320256 dirty
Jan 13 15:08:44 docker2 kernel: EXT4-fs error (device dm-5) in ext4_writepages:2520: IO failure

第一反映是查看磁盘空间使用情况

[root@zk-1 ~]# df -Th
Filesystem           Type   Size  Used Avail Use% Mounted on
/dev/mapper/docker-253:0-4298664622-7830c39693a73c13e80cf2a22a46558b22adcc5adf94ca1c893f44ece878601e
                     ext4    99G   49G   46G  52% /
tmpfs                tmpfs  126G     0  126G   0% /dev
shm                  tmpfs   64M     0   64M   0% /dev/shm
tmpfs                tmpfs  126G     0  126G   0% /sys/fs/cgroup
/dev/mapper/centos-root
                     xfs    1.7T  113G  1.6T   7% /nfsc
tmpfs                tmpfs  126G     0  126G   0% /run/secrets
/dev/mapper/centos-root
                     xfs    1.7T  113G  1.6T   7% /etc/resolv.conf
/dev/mapper/centos-root
                     xfs    1.7T  113G  1.6T   7% /etc/hostname
/dev/mapper/centos-root
                     xfs    1.7T  113G  1.6T   7% /etc/hosts
                 

根目录下还剩余46G未使用;感觉甚是怪异,于是网上搜索很多资料终于有了相关解释
参考网络上的一片文章:http://jpetazzo.github.io/201...

具体原因

docker服务启动的时候默认会创建一个107.4G的data文件,而后启动的容器的所有更改内容全部存储至这个data文件中;也就是说当容器内产生的相关data数据超过107.4G后容器就再也没有多余的空间可用,从而导致所有容器的根目录变为只读!

宿主机的docker info信息如下

[root@docker2 ~]# docker info
Containers: 169
Images: 1672
Storage Driver: devicemapper
 Pool Name: docker-253:0-4298664622-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 107.4 GB
 Data Space Total: 107.4 GB
 Data Space Available: 0 B
 Metadata Space Used: 137.4 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.01 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.107-RHEL7 (2015-10-14)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.1-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
CPUs: 40
Total Memory: 251.9 GiB
Name: docker2.stg.1qianbao.com
ID: JMZF:IQ6H:RDBK:XNSN:W3IO:ZAQH:RRFB:XRIT:4I72:KOKD:R34K:FD5L
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

由于我的容器比较多(169个jboss应用),故直接导致整个环境不可用。

解决方案

  • 停止docker服务

service docker stop

  • 删除/var/lib/docker下面的所有文件(删除后你的镜像和容器都没有了,建议将有用的镜像先备份或者上传至仓储里面)

rm -rf /var/lib/docker/*

  • 使用更大的文件或磁盘或逻辑卷创建/var/lib/docker/devicemapper/devicemapper/data文件

    1. 使用文件:dd if=/dev/zero of=/var/lib/docker/devicemapper/devicemapper/data bs=1G count=0 seek=1000这样将会创建一个虚拟的1000G大小的data文件,如果不加seek参数count直接为1000的话则是创建了一个结结实实的1000G的文件

    2. 使用磁盘:ln -s /dev/sdb /var/lib/docker/devicemapper/devicemapper/data

    3. 使用逻辑卷:ln -s /dev/mapper/centos-dockerdata /var/lib/docker/devicemapper/devicemapper/data

我用的是第一种使用文件的方法创建了一个1.6T的虚拟文件

mkdir -p /var/lib/docker/devicemapper/devicemapper/
dd if=/dev/zero  of=/var/lib/docker/devicemapper/devicemapper/data bs=1G count=0 seek=1600

创建完成后启动docker服务

service docker start

这时再看下docker info中的data池

[root@docker2 ~]# docker info
Containers: 169
Images: 1701
Storage Driver: devicemapper
 Pool Name: docker-253:0-2355438-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 90.9 GB
 Data Space Total: 1.611 TB
 Data Space Available: 1.52 TB
 Metadata Space Used: 147.5 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.107-RHEL7 (2015-10-14)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.1-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
CPUs: 40
Total Memory: 251.9 GiB
Name: docker2.stg.1qianbao.com
ID: JMZF:IQ6H:RDBK:XNSN:W3IO:ZAQH:RRFB:XRIT:4I72:KOKD:R34K:FD5L
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

此后,你的data文件有多大就决定了你的宿主机上所有容器可用的空间的大小!

另一方面:也可以通过docker启动参数的--storage-opt选项来限制每个容器初始化的磁盘大小,如-storage-opt dm.basesize=80G 这样每个容器启动后起根目录的总空间就是80G

[root@zk-1 ~]# df -Th
Filesystem           Type   Size  Used Avail Use% Mounted on
/dev/mapper/docker-253:0-27661746-8b7f953fb4759982ad82235c27e39dfe7190b55180d63cbcf3aa2fdc6569d43a
                     ext4    79G  785M   74G   2% /
tmpfs                tmpfs  126G     0  126G   0% /dev
shm                  tmpfs   64M     0   64M   0% /dev/shm
tmpfs                tmpfs  126G     0  126G   0% /sys/fs/cgroup
tmpfs                tmpfs  126G     0  126G   0% /run/secrets
/dev/mapper/centos-root
                     xfs    1.5T  338G  1.2T  23% /wls/wls81/zookeeper.out
/dev/mapper/centos-root
                     xfs    1.5T  338G  1.2T  23% /etc/resolv.conf
/dev/mapper/centos-root
                     xfs    1.5T  338G  1.2T  23% /etc/hostname
/dev/mapper/centos-root
                     xfs    1.5T  338G  1.2T  23% /etc/hosts

已注销
73 声望14 粉丝

运维工程师